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1. INTRODUCTION 

The development of neural network modeling applied to time series data has grown rapidly. Some 
interesting problems to be investigated include the determination technique for the optimal input, the number 
of units in the hidden layer [1]-[3], the activation function in the hidden layer [4], [5], and the selection of 
optimization methods to get the optimal weight [6]-[8]. Modeling procedures to obtain the optimal architecture 
have also been developed, in terms of theoretical, application and computational studies [9]-[11]. In terms of 
finding the optimal weights, the optimization method is one of the main focuses of the discussion. 
Gradient-based optimization techniques have become the standard method for this problem. As a consequence, 
the activation function used must meet the continuous and differentiable conditions. The idea to develop 
non-gradient based metaheuristic optimization methods for the optimization of a function has also progressed 
a lot [12], [13]. Support from advances in computational aspects also facilitates the development of new 
non-gradient methods. Advances in statistical and mathematical modeling aspects have also generated various 
alternative models to get better predictions. The complexity of these models is getting higher. As a 
consequence, appropriate optimization techniques are needed to obtain parameter estimates. 

The idea of developing metaheuristic algorithms as optimization techniques is a new chapter in 
statistical modeling. These non-gradient methods are useful in terms of parameter estimation of alternative 
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models. Several optimization methods that fall into this category include genetic algorithm [14]-[16], ant 
colony [17], [18], simulated annealing [19], and particle swarm optimization [20], [21]. The use of these 
methods for parameter estimation of statistical models is still limited. Previous studies have demonstrated the 
superiority of metaheuristic optimization techniques in terms of estimation results. However, the drawbacks 
have also been found in terms of the iteration time required to reach convergence, which is longer than the 
gradient-based method. The purpose of this study is to find the best optimization method of the three 
metaheuristic algorithms, namely genetic algorithm (GA), particle swarm optimization (PSO), and modified 
artificial bee colony (MABC). This procedure is applied to rainfall, a data type that is known to have a seasonal 
pattern. In this case, the proposed procedure is used to measure the accuracy of in-sample predictions and 
out-sample predictions. 


2. RESEARCH METHOD 

This section describes the modeling stages carried out. The explanation is divided into two main parts. 
The first part describes the algorithm of neural network modeling and its architecture in the case of time series. 
The second part discusses the metaheuristic optimization methods used to obtain the optimal weights of the 
neural network model. There are three methods discussed, namely GA, PSO, and MABC. The proposed 
procedure is applied to the ten daily rainfall data which is a time series containing seasonal aspects. At the data 
processing stage, predictions are made of the training and testing data, then the accuracy of the three 
optimization methods is compared. A brief explanation of each part is presented as follows: 


2.1. Feed forward neural networks 

Neural network (NN) is a modeling algorithm inspired by the human brain, mimicking the way of 
signaling between biological neurons. The main class of neural network model is feed forward neural network 
(FFNN). Backpropagation algorithm is the most popular learning method. In FFNN, hidden layer(s) are 
inserted between input and output. FFNN is one type of neural network which is most often used in various 
applications, including time series [22], [23]. Architecture of this class contains a number of processing units 
such as simple neurons arranged in layers. The units in each layer are called neurons or nodes. Each neuron is 
connected to each neuron in the next layer. The strength of the relationship between the layer units is expressed 
as weights. The weights can vary depending on how strong the connection between the neurons is. 

Complexity of the neural network model is determined by how many units are in each layer. The more 
complex the network, the more weights should be estimated. The type and number of units in the input layer 
is largely determined by the purpose of the model. In the application for classification problems, the network 
input is determined first. Meanwhile, in the application for time series problems, the input is influenced by how 
strong the relationship between lagged variables is with the current data as the output target. Meanwhile, the 
determination of the number of units in the output layer is based on univariate or multivariate analysis. In terms 
of the application of FFNN for univariate time series modeling, various modeling procedures continue to 
develop. Therefore, the number of neurons in output layer is one and the output is (1). 


y = f° (Èz wf (Swix) (1) 


Where f° is the activation function in output layer, w;’is the weight from hidden unit j to output, i is is the 
activation function in hidden unit J, wii is the weight from input 1 to hidden unit j. In (1) has no bias, it can be 


added as input. The model also accommodates various activation functions for each unit in the hidden layer. If 
all the same, i.e. f”, then (1) becomes: 


y = f° (w? +O wef” (w? + Er whxi)) (2) 


where w” is the weights from bias to output and w? is the weights from bias to hidden layer. In the perspective 
of time series modeling, past data series is input of the model, while the present data x; is the output. Hence, if 
input is lagged values of 1 until p, or Xt-1, ..., Xt-p, 1n (2) becomes: 


xe = f° (Ww? + Xj w fW + Lika wiixe-i)) (3) 


configuration of FFNN architecture for time series is as follows. The input network consists of Xt.1 up to Xt-p 
and a bias. The hidden layer consists of n neurons while the output contains one neuron. This architecture is 
represented in Figure 1. 
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Figure 1. FFNN architecture for time series modeling 


Three steps of backpropagation algorithm in neural network modeling sequentially consist of 
feedforward, error calculation and adjusting weights. In the feedforward stage, each input unit sends a signal 
to all neurons in the hidden layer, as well as from the hidden layer to the output. Furthermore, the differencing 
between target and output will produce errors. Updating the weights is carried out with certain optimization 
methods to reduce errors. With the new weights obtained, the feedforward stage is then repeated again. This 
procedure continues until the stopping criterion is reached, either the maximum epoch or the minimum error. 
In this study, metaheuristic optimization is used as an update of the weights. The main objective of 
metaheuristic optimisation in feed forward neural network modeling is to get the best prediction resulting from 
the optimal weights. 


2.2. Genetic algorithm (GA) 

GA is a metaheuristic search that is routinely used to obtain beneficial solutions for optimization and 
search problems [24]. This algorithm does its job by imitating natural genetic mechanisms, which search for 
the best gene structures in a creature's body. The theory of evolution is the basis for the emergence of genetic 
algorithms. In this theory, species with better adaptability will have a greater chance of survival. The algorithm 
begins with the initialization of a set of potential solutions. Adjustments through iterations are made to obtain 
the best solution. The set of potential solutions is called a chromosome and is predetermined. Chromosomes 
are formed in the form of a binary alphabet. The set of chromosomes represents a population. Iteration stages 
in the process of chromosome evolution are called generations. The evolution process consists of selection, 
crossover and mutation. In each generation, the process of evolution will produce a new generation or offspring. 
Genetic algorithms have several characteristics such as: the search is carried out on a population of points, not 
only from single point; work is performed on a set of encoding parameters, not directly on the parameters 
themselves; based on information on the fitness function or objective function, not the derivative function; and 
random operations are performed on each iteration with probabilistic adjustments, not with derivative rules. 

The candidate solutions are encoded in the form of chromosomes. Each chromosome contains the 
genes and is the same length. The element in each gene is a binary alphabet. In the regeneration process, each 
x chromosome corresponds to the fitness function f(x). Determination of the length of chromosome, encoding 
and alphabet which is a mapping from a group with a certain universe of discourse (Q) to a set of chromosomes 
is called a representation scheme. Selection operation is the first step of genetic algorithm after initial 
population P(0). At this stage, a set of mating pools M(k) is formed whose number of elements is identical to 
the number of elements in P(k). Every point m(k) in M(k) is taken from the points x(k) in P(k). The next stage 
is evolution process. At this stage, crossing over 1s carried out by taking a pair of chromosomes as the parent 
which gives birth to a new pair of chromosomes called offspring. The probability of selecting a parent pair of 
M(k) is random with probability pe. Furthermore, the mutation process is applied with a small probability, 
Pm << 1, by changing or reversing the value of one or more genes in a chromosome. Good chromosomes will 
be preserved to survive in the next generation. Elitism is an assistance strategy that is applied by saving good 
chromosomes in the previous generation so that they are still preserved in the next generation. Linear fitness 
ranking (LFR) is another useful strategy for measuring fitness scores based on individual evaluations. This 
strategy 1s carried out to reduce the effect of a large variance on the fitness value obtained. This can be useful 
to avoid the tendency to converge to a local optimum solution. To get the chromosomes with the best fitness, 
this procedure is repeated until certain stopping criteria are fulfilled. 

The step by step of genetic algorithm can be summarized as follows: 1) shape the initial population or 
P(O), 11) evaluate P(k), 111) if stopping criteria is fullfilled then stop, iv) select M(k) from P(k), v) arrange M(k) 
to the form P(k + 1), and vi) go to step 2 (set: k= k + 1). In this research, the specifications of genetic algorithm 
were: population size=20, the number of generations=10000, probability of crossover p.=0.7, probability of 
mutation pm=0.1. Roulette wheel was used as the parents couple selection method. 


TELKOMNIKA Telecommun Comput El Control, Vol. 19, No. 6, December 2021: 1892 - 1901 


TELKOMNIKA Telecommun Comput El Control O 1895 


2.3. Particle swarm optimization 

Particle swarm optimization (PSO) is motivated by intelligent collective behavior of some animals 
such as flocks of birds or schools of fish [25]. Among the various algorithms in swarm intelligence, PSO is 
considered the most important one [26]. It is a population-based stochastic optimization algorithm. In weight 
optimization with PSO, the initial position of the particles is generated randomly. At this initial position, all 
particles do not move so that the initial velocity is given a value of zero. Selection of the fitness value in this 
initial position is very important because it will determine the best global position (gBest) and the best 
individual position (pBest). In this study, mean square error (MSE) is used as a fitness value. PSO stages for 
optimizing FENN weights are described systematically as follows: 1) Determine the initial values including the 
number of particles, the coefficient of acceleration, maximum number of iterations, and the weight of inertia; 
11) In randomly, determine the initial velocity and initial position of each particle; 111) Calculate the output based 
on the weights in the initial position; 1v) Calculate the fitness values of each particle, then select the minimum 
MSE as the optimal fitness; v) Choose the best position (pBest) based on the fitness value. The best position 
chosen from each iteration becomes the best global (gBest); vi) Update velocity and particle position; 
vii) Determine the new position; viii) Calculate the network output by using the weights at the new position; 
and 1x) Go to 4. 

These stages continue up to the stopping criteria are met. In this research, the maximum number of 
iterations was 500, and the population size (swarm size) was 10. These are the PSO parameters used in this 
research: inertia weight =1, personal learning coefficient=1.5, inertia weight damping ratio=0.99, and global 
learning coefficient=2.0. 


2.4. Modified artificial bee colony 
In the search of optimal solution, artificial bee colony (ABC) adopts the habit of a swarm of bees in 
search of food. It was developed by Karaboga and Basturk [27]. The performance of ABC has better quality 
or is equivalent to other swarm algorithms such as genetic algorithm, particle swarm optimization, differential 
evolution, and evolution strategies with the advantage of using fewer control parameters [28]. The stages of 
ABC algorithm were: 
1) Initialization of population using xij = Xminj + rand(0,1)(%maxj — Xminj), Where xij is position of 
population i and parameter j. 
2) Evaluation of population 
3) Iteration = 1 
4) Replication 
a. For the stage of worker bee: 
— Generate a new solution: vi; = Xij + bij (Xi; — Xgj) , Where vij is new position of population i and 
parameter j whereas @j;,; is a uniformly distributed random number in the range of -1 and 1. 


— Determine fitness of the solution using fitness; = — 


MSE 
— Compare v; and x; 
= i ae ; I fitnessi 
Determine the probability using p; = SSN Fitness, 
b. For the stage of guardian bee: 
— Choose the x;; solution based on p; 
— Generate a new v;i 
— Determine fitness of the solution using fitness; = ae 


— Compare v;; and xij 
c. For the stage of surveillance bee: 
If there is a sowlution left behind, replace it by generating a new solution x; randomly using vj; = Xij + 
Pij (Xij — Xkj) 
d. Save the best solution 
e. Add iteration with 1, iteration = iteration + 1 
5) Until the requirements are met or iteration = maximum iteration 
The modification of the ABC algorithm provides better convergence performance when compared to 
the ordinary ABC algorithm [29]. Modifications made are on the ABC formula as follows: 


Vij = Xij + by (xy — Xr) + Py — xy) 


where @j; is a uniformly distributed random number in the range of 0 to 1.5 while y; is the jth element of the 
best global solution. The formula is inspired by the search mechanism of the PSO algorithm and is used to 
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improve the level of convergence of the PSO algorithm. In addition, the solution probability formula was also 
changed to: 


1 
pi = exp(— oe 


where p = 2.5. Research conducted by Shahrudin and Mahmuddin [29] showed that no matter what value is 
used in the simulation, it concludes that MABC is better than ABC. 


3. RESULTS AND DISCUSSION 

In this study, we used the ten-daily of rainfall from ZOM 138 Bawak Klaten Central Java from 
January 2010 until July 2018 with the length of 309. This data was taken from Meteorology, Climatology and 
Geophysics Agency (BMKG). The data is divided into two parts. The first part is the training data used for 
in-sample prediction while the second part is for testing data. The actual data is 309 which is divided into 273 
training data and the remaining 36 as testing data. The proportion of training and testing data is close to the 
90:10 composition, but not exactly. The argument is that the type of data used is ten-daily data and contains 
seasonal aspects, so that in one year there are 36 observation points. Of course, it would be wise to use the last 
year's data as testing data, instead of an incomplete year. Since the input variable is up to lag 18, the training 
data becomes 255. In this research, each optimization method was tried for several architectures. To match the 
number of inputs used, the number of hidden units used is from one to two times the input. In this case, there 
are three input variables i.e. lags of 1, 2, and 18 so the number of hidden unit is from one to six. The activation 
functions used in hidden layer and output layer are logistic sigmoid and linear, respectively. Table 1 shows the 
results of each experiment. 

Table 1 shows that optimization of neural network model by using GA given the best value of MSE 
based on the testing data for out-sample prediction. This happens in experiments with three hidden units which 
are equal to the number of input variables. This is appropriate with the rule of thumb that the units in input 
correspond to the units in hidden layer. Meanwhile, if the results of the out-sample prediction with testing data 
are used as a basis for selection, the best method is PSO with five hidden units. However, it appears that the 
results of in-sample predictions from the GA method with four to six hidden units are able to approach the best 
results from the PSO. This provides sufficient reason to choose GA as the best method. This is consistent with 
the result that the out-sample prediction of the PSO method with five hidden units is very poor. Thus, PSO is 
less successful in guaranteeing that good in-sample predictions will also produce good out-sample predictions. 
The results of the MABC method also indicate not to choose this method as the recommended technique. Plot 
of the convergence graph of the GA and PSO shown at Figure 2 indicate the effectiveness of the performance 
due to increasing number of iterations. 

In Figure 2 (a), it can be seen that there is a rapid decline at the beginning of the iteration or generation 
of the GA optimization. The calculated points displayed are the average fitness value of the population which 
is a set of chromosomes and the best fitness. Fluctuations depicted show the rise and fall of the average fitness. 
The results of these mean fitness indicate that in each generation, there are very bad and very good 
chromosomes. Chromosome that produces the best fitness values is preserved in the next generation to ensure 
convergence. The process towards convergence requires relatively long iterations. This indicates that there is 
a decrease in the accuracy value which is rather slow but continues. Similarly, the PSO optimization in 
Figure 2 (b) also shows an extreme slowdown at the beginning of the iteration. The displayed figure is the best 
value for each iteration. The iteration process towards convergence in PSO is very fast. In fact, convergence 
has appeared after 200 iterations. In terms of the number of iterations, PSO is more efficient. It requires fewer 
iterations than GA. However, the time required for one iteration in PSO is relatively longer than in GA. Overall, 
both methods give convergent results at the optimal MSE values. 

More interesting discussion arises when looking at the results of the averages and variances of each 
method. Taking into account the mean value, GA appears to be superior to both in-sample and out-sample 
predictions. The results obtained indicate that the GA method provides the lowest of MSEs average. PSO 
provides out-sample prediction results that are similar to MABC but produces better in-sample predictions on 
average. By paying attention to variance, it strengthens to choose GA because it has the smallest variance value 
in both training and testing data. These results indicate that GA is the best in terms of the stability of the results. 
Meanwhile, MABC is more stable in the out-sample prediction than PSO and vice versa, PSO is more stable 
in the in-sample prediction than MABC. Based on the discussion, GA is recommended to be chosen as the top 
priority for the optimization method of the neural network model. Of course, this applies to the case of rainfall 
data containing seasonal patterns as restricted in this study. As an illustration, plots of the results of in-sample 
predictions versus actual and out-sample predictions versus actual and the from the GA method can be seen in 
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the Figure 3 and Figure 4. The in-sample predictions are obtained from modeling using actual training data 
whereas out-sample predictions are obtained from that using actual testing data. The blue line indicate the 


actual and the blue one represent the predictions. 


Table 1. Results of the experiments 





3 
Metaheuristic Optimization Method Number of hidden units ae oa ; 

Training Testing 

Genetic Algorithm 1 2.2133 2.5435 
2 2.2196 2.7110 

3 2.2263 2.4919 

4 2.1658 2.7055 

5 2.1707 2.4946 

6 2.1703 2.5784 

average 2.2044 2.5875 

variance 0.0018 0.0098 

Particle Swarm Optimization 1 2.2170 2.7121 
2 2.3849 2.8685 

3 2.2541 2.6390 

4 2.2944 2.5171 

5 2.1551 3.1108 

6 2.2587 2.5004 
average 2.2607 2.72465 

variance 0.0059 0.0541 

Modified Artificial Bee Colony 1 2.4027 2.6378 
2 2.3953 2.6468 

3 2.5396 2.8032 

4 2.2446 212 

5 2.4242 2.8645 

6 2.3840 2.8003 

average 2.3984 2.7213 

variance 0.0089 0.0135 

Best: 53.131 Mean: 53.2405 
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Figure 2. Convergence graph of the (a) GA and (b) PSO 
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Plots in Figure 3 and Figure 4 indicate that seasonal patterns in rainfall data can be approached and 
predicted well by the model. However, the results obtained from the proposed modeling procedure seem unable 
to detect data with extreme values. If there is data with extreme high values, the prediction results are not able 
to predict well. This is the biggest source of error of the model used. Seasonal patterns from rainfall data have 
indeed been approached successfully by the model, but not with these extreme data. This becomes a weakness 
of the proposed modeling procedure and provides an open space for finding models or methods that are more 
suitable for this type of data. 


Plot Actual vs in-sample Prediction of Neural Network with GA 
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Figure 3. Plot of actual vs in-sample prediction of neural network with genetic algorithm 


Plot Actual vs out-sample Prediction of Neural Network with GA 
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Figure 4. Plot of actual vs out-sample prediction of neural network with genetic algorithm 


A review of the previously reported studies was carried out to evaluate the results obtained from this 
research. Mostly, the results obtained are in accordance with previous studies. The metaheuristic optimization 
method has also been proven successful in optimizing the adaptive neuro-fuzzy inference system (ANFIS) 
model for rainfall data [30]. In this report, PSO, GA and the hybrid are suitable for optimizing models and 
better than classical one. In [31], GA appears to be superior to PSO for optimizing Job Shop Scheduling 
Problems. Similarly, as reported in [32], GA also produced the highest performance in estimating the heating 
load of building energy efficiency for smart city planning, superior to PSO, imperial competitive algorithm 
(ICA), and ABC. Likewise, the three metaheuristic methods have been used for optimizing support vector 
machine (SVM) in the case of classification [33]. The GA had the highest average overall accuracy, followed 


TELKOMNIKA Telecommun Comput El Control, Vol. 19, No. 6, December 2021: 1892 - 1901 


TELKOMNIKA Telecommun Comput El Control O 1899 


by ABC and PSO. Whereas, in [34] ABC is better than or similar to GA and PSO with the advantage of 
employing fewer control parameters. Research involving all three methods with equally good results was 
obtained in [35]. In this report, is more successful in evolving larger networks and the PSO is more successful 
on smaller networks. Ramdania et al. [36] has reported that the PSO fitness value outperforms the genetic 
algorithm, but the genetic algorithm execution time 1s faster than the PSO algorithm. PSO involves less overall 
computation effort than GA but shown to outperform the GA for smaller population sizes [37]. Slightly 
different results were obtained in [38] and [39]. In the two results, PSO gave better results than GA and ABC. 
It should be noted that most of the research done has been applied in addition to the seasonal time series 
problem. Therefore, the characteristics of the data are very influential on the results obtained. 


4. CONCLUSION 

Three metaheuristic based optimization methods have been used to determine the optimal weight of 
the neural network model in the rainfall data. With a variety of architectures determined, optimization with 
genetic algorithms is recommended for use in models and data types of this type. This technique is more stable 
and provides better predictions than the other two techniques. The problem of detecting data with extreme 
values may be solved in the future by using appropriate preprocessing such as data normalization. For the 
purposes of validating the results of predictions and comparisons more broadly, the using of some other 
optimization techniques can also be investigated. Furthermore, it can also be combined between metaheuristic 
optimization methods with various gradient-based optimization methods to get better prediction results. The 
use of hybrid methods between metaheuristic optimization techniques can also be one possible solution to 
better predict extreme data. The various optimization techniques proposed can also be applied in searching of 
weights on other types of neural networks such as recurrent neural networks, convolutional neural networks 
and cascade forward neural networks and even for machine learning and other deep learning models. Further 
modeling procedures can also be applied to various types of data that are not included in the seasonal category. 
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